[[bbv2.arch]]
B2 v2 architecture
---------------------------

This document is work-in progress. Do not expect much from it yet.

[[bbv2.arch.overview]]
Overview
--------

B2 implementation is structured in four different components:
"kernel", "util", "build" and "tools". The first two are relatively
uninteresting, so we will focus on the remaining pair. The "build"
component provides classes necessary to declare targets, determining
which properties should be used for their building, and creating the
dependency graph. The "tools" component provides user-visible
functionality. It mostly allows declaring specific kinds of main
targets, as well as registering available tools, which are then used
when creating the dependency graph.

[[bbv2.arch.build]]
The build layer
---------------

The build layer has just four main parts -- metatargets (abstract
targets), virtual targets, generators and properties.

* Metatargets (see the "targets.jam" module) represent all the
user-defined entities that can be built. The "meta" prefix signifies
that they do not need to correspond to exact files or even files at all
-- they can produce a different set of files depending on the build
request. Metatargets are created when Jamfiles are loaded. Each has a
`generate` method which is given a property set and produces virtual
targets for the passed properties.
* Virtual targets (see the "virtual-targets.jam" module) correspond to
actual atomic updatable entities -- most typically files.
* Properties are just (name, value) pairs, specified by the user and
describing how targets should be built. Properties are stored using the
`property-set` class.
* Generators are objects that encapsulate specific tools -- they can
take a list of source virtual targets and produce new virtual targets
from them.

The build process includes the following steps:

1.  Top-level code calls the `generate` method of a metatarget with some
properties.
2.  The metatarget combines the requested properties with its
requirements and passes the result, together with the list of sources,
to the `generators.construct` function.
3.  A generator appropriate for the build properties is selected and its
`run` method is called. The method returns a list of virtual targets.
4.  The virtual targets are returned to the top level code, and for each
instance, the `actualize` method is called to setup nodes and updating
actions in the dependency graph kept inside B2 engine. This
dependency graph is then updated, which runs necessary commands.

[[bbv2.arch.build.metatargets]]
Metatargets
~~~~~~~~~~~

There are several classes derived from "abstract-target". The
"main-target" class represents a top-level main target, the
"project-target" class acts like a container holding multiple main
targets, and "basic-target" class is a base class for all further target
types.

Since each main target can have several alternatives, all top-level
target objects are actually containers, referring to "real" main target
classes. The type of that container is "main-target". For example,
given:

....
alias a ;
lib a : a.cpp : <toolset>gcc ;
....

we would have one-top level "main-target" instance, containing one
"alias-target" and one "lib-target" instance. "main-target"'s "generate"
method decides which of the alternative should be used, and calls
"generate" on the corresponding instance.

Each alternative is an instance of a class derived from "basic-target".
"basic-target.generate" does several things that should always be done:

* Determines what properties should be used for building the target.
This includes looking at requested properties, requirements, and usage
requirements of all sources.
* Builds all sources.
* Computes usage requirements that should be passed back to targets
depending on this one.

For the real work of constructing a virtual target, a new method
"construct" is called.

The "construct" method can be implemented in any way by classes derived
from "basic-target", but one specific derived class plays the central
role -- "typed-target". That class holds the desired type of file to be
produced, and its "construct" method uses the generators module to do
the actual work.

This means that a specific metatarget subclass may avoid using
generators all together. However, this is deprecated and we are trying
to eliminate all such subclasses at the moment.

Note that the `build/targets.jam` file contains an UML diagram which
might help.

[[bbv2.arch.build.virtual]]
Virtual targets
~~~~~~~~~~~~~~~

Virtual targets are atomic updatable entities. Each virtual target can
be assigned an updating action -- instance of the `action` class. The
action class, in turn, contains a list of source targets, properties,
and a name of an action which should be executed.

We try hard to never create equal instances of the `virtual-target`
class. Code creating virtual targets passes them though the
`virtual-target.register` function, which detects if a target with the
same name, sources, and properties has already been created. In that
case, the preexisting target is returned.

When all virtual targets are produced, they are "actualized". This means
that the real file names are computed, and the commands that should be
run are generated. This is done by the `virtual-target.actualize` and
`action.actualize` methods. The first is conceptually simple, while the
second needs additional explanation. Commands in B2 are
generated in a two-stage process. First, a rule with an appropriate name
(for example "gcc.compile") is called and is given a list of target
names. The rule sets some variables, like "OPTIONS". After that, the
command string is taken, and variable are substitutes, so use of OPTIONS
inside the command string gets transformed into actual compile options.

B2 added a third stage to simplify things. It is now possible
to automatically convert properties to appropriate variable assignments.
For example, <debug-symbols>on would add "-g" to the OPTIONS variable,
without requiring to manually add this logic to gcc.compile. This
functionality is part of the "toolset" module.

Note that the `build/virtual-targets.jam` file contains an UML diagram
which might help.

[[bbv2.arch.build.properties]]
Properties
~~~~~~~~~~

Above, we noted that metatargets are built with a set of properties.
That set is represented by the `property-set` class. An important point
is that handling of property sets can get very expensive. For that
reason, we make sure that for each set of (name, value) pairs only one
`property-set` instance is created. The `property-set` uses extensive
caching for all operations, so most work is avoided. The
`property-set.create` is the factory function used to create instances
of the `property-set` class.

[[bbv2.arch.tools]]
The tools layer
---------------

Write me!

[[bbv2.arch.targets]]
Targets
-------

NOTE: THIS SECTION IS NOT EXPECTED TO BE READ! There are two
user-visible kinds of targets in B2. First are "abstract" —
they correspond to things declared by the user, e.g. projects and
executable files. The primary thing about abstract targets is that it is
possible to request them to be built with a particular set of
properties. Each property combination may possibly yield different built
files, so abstract target do not have a direct correspondence to built
files.

File targets, on the other hand, are associated with concrete files.
Dependency graphs for abstract targets with specific properties are
constructed from file targets. User has no way to create file targets
but can specify rules for detecting source file types, as well as rules
for transforming between file targets of different types. That
information is used in constructing the final dependency graph, as
described in the link:#bbv2.arch.depends[next section]. **Note:**File
targets are not the same entities as Jam targets; the latter are created
from file targets at the latest possible moment. *Note:*"File target" is
an originally proposed name for what we now call virtual targets. It is
more understandable by users, but has one problem: virtual targets can
potentially be "phony", and not correspond to any file.

[[bbv2.arch.depends]]
Dependency scanning
-------------------

Dependency scanning is the process of finding implicit dependencies,
like "#include" statements in {CPP}. The requirements for correct
dependency scanning mechanism are:

* link:#bbv2.arch.depends.different-scanning-algorithms[Support for
different scanning algorithms]. {CPP} and XML have quite different syntax
for includes and rules for looking up the included files.
* link:#bbv2.arch.depends.same-file-different-scanners[Ability to scan
the same file several times]. For example, a single {CPP} file may be
compiled using different include paths.
* link:#bbv2.arch.depends.dependencies-on-generated-files[Proper
detection of dependencies on generated files.]
* link:#bbv2.arch.depends.dependencies-from-generated-files[Proper
detection of dependencies from a generated file.]

[[bbv2.arch.depends.different-scanning-algorithms]]
Support for different scanning algorithms
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Different scanning algorithm are encapsulated by objects called
"scanners". Please see the "scanner" module documentation for more
details.

[[bbv2.arch.depends.same-file-different-scanners]]
Ability to scan the same file several times
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As stated above, it is possible to compile a {CPP} file multiple times,
using different include paths. Therefore, include dependencies for those
compilations can be different. The problem is that B2 engine
does not allow multiple scans of the same target. To solve that, we pass
the scanner object when calling `virtual-target.actualize` and it
creates different engine targets for different scanners.

For each engine target created with a specified scanner, a corresponding
one is created without it. The updating action is associated with the
scanner-less target, and the target with the scanner is made to depend
on it. That way if sources for that action are touched, all targets —
with and without the scanner are considered outdated.

Consider the following example: "a.cpp" prepared from "a.verbatim",
compiled by two compilers using different include paths and copied into
some install location. The dependency graph would look like:

....
a.o (<toolset>gcc)        <--(compile)-- a.cpp (scanner1) ----+
a.o (<toolset>msvc)       <--(compile)-- a.cpp (scanner2) ----|
a.cpp (installed copy)    <--(copy) ----------------------- a.cpp (no scanner)
                                                                 ^
                                                                 |
                       a.verbose --------------------------------+
....

[[bbv2.arch.depends.dependencies-on-generated-files]]
Proper detection of dependencies on generated files.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This requirement breaks down to the following ones.

1.  If when compiling "a.cpp" there is an include of "a.h", the "dir"
directory is on the include path, and a target called "a.h" will be
generated in "dir", then B2 should discover the include, and
create "a.h" before compiling "a.cpp".
2.  Since B2 almost always generates targets under the "bin"
directory, this should be supported as well. I.e. in the scenario above,
Jamfile in "dir" might create a main target, which generates "a.h". The
file will be generated to "dir/bin" directory, but we still have to
recognize the dependency.

The first requirement means that when determining what "a.h" means when
found in "a.cpp", we have to iterate over all directories in include
paths, checking for each one:

1.  If there is a file named "a.h" in that directory, or
2.  If there is a target called "a.h", which will be generated in that
that directory.

Classic Jam has built-in facilities for point (1) above, but that is not
enough. It is hard to implement the right semantics without builtin
support. For example, we could try to check if there exists a target
called "a.h" somewhere in the dependency graph, and add a dependency to
it. The problem is that without a file search in the include path, the
semantics may be incorrect. For example, one can have an action that
generated some "dummy" header, for systems which do not have a native
one. Naturally, we do not want to depend on that generated header on
platforms where a native one is included.

There are two design choices for builtin support. Suppose we have files
a.cpp and b.cpp, and each one includes header.h, generated by some
action. Dependency graph created by classic Jam would look like:

....
a.cpp -----> <scanner1>header.h  [search path: d1, d2, d3]

                  <d2>header.h  --------> header.y
                  [generated in d2]

b.cpp -----> <scanner2>header.h  [search path: d1, d2, d4]
....

In this case, Jam thinks all header.h target are not related. The
correct dependency graph might be:

....
a.cpp ----
          \
           >---->  <d2>header.h  --------> header.y
          /       [generated in d2]
b.cpp ----
....

or

....
a.cpp -----> <scanner1>header.h  [search path: d1, d2, d3]
                          |
                       (includes)
                          V
                  <d2>header.h  --------> header.y
                  [generated in d2]
                          ^
                      (includes)
                          |
b.cpp -----> <scanner2>header.h [ search path: d1, d2, d4]
....

The first alternative was used for some time. The problem however is:
what include paths should be used when scanning header.h? The second
alternative was suggested by Matt Armstrong. It has a similar effect:
Any target depending on <scanner1>header.h will also depend on
<d2>header.h. This way though we now have two different targets with two
different scanners, so those targets can be scanned independently. The
first alternative's problem is avoided, so the second alternative is
implemented now.

The second sub-requirements is that targets generated under the "bin"
directory are handled as well. B2 implements a semi-automatic
approach. When compiling {CPP} files the process is:

1.  The main target to which the compiled file belongs to is found.
2.  All other main targets that the found one depends on are found.
These include: main targets used as sources as well as those specified
as "dependency" properties.
3.  All directories where files belonging to those main targets will be
generated are added to the include path.

After this is done, dependencies are found by the approach explained
previously.

Note that if a target uses generated headers from another main target,
that main target should be explicitly specified using the dependency
property. It would be better to lift this requirement, but it does not
seem to be causing any problems in practice.

For target types other than {CPP}, adding of include paths must be
implemented anew.

[[bbv2.arch.depends.dependencies-from-generated-files]]
Proper detection of dependencies from generated files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Suppose file "a.cpp" includes "a.h" and both are generated by some
action. Note that classic Jam has two stages. In the first stage the
dependency graph is built and actions to be run are determined. In the
second stage the actions are executed. Initially, neither file exists,
so the include is not found. As the result, Jam might attempt to compile
a.cpp before creating a.h, causing the compilation to fail.

The solution in Boost.Jam is to perform additional dependency scans
after targets are updated. This breaks separation between build stages
in Jam — which some people consider a good thing — but I am not aware of
any better solution.

In order to understand the rest of this section, you better read some
details about Jam's dependency scanning, available at
http://public.perforce.com:8080/@md=d&cd=//public/jam/src/&ra=s&c=kVu@//2614?ac=10[this
link].

Whenever a target is updated, Boost.Jam rescans it for includes.
Consider this graph, created before any actions are run.

....
A -------> C ----> C.pro
     /
B --/         C-includes   ---> D
....

Both A and B have dependency on C and C-includes (the latter dependency
is not shown). Say during building we have tried to create A, then tried
to create C and successfully created C.

In that case, the set of includes in C might well have changed. We do
not bother to detect precisely which includes were added or removed.
Instead we create another internal node C-includes-2. Then we determine
what actions should be run to update the target. In fact this means that
we perform the first stage logic when already in the execution stage.

After actions for C-includes-2 are determined, we add C-includes-2 to
the list of A's dependents, and stage 2 proceeds as usual.
Unfortunately, we can not do the same with target B, since when it is
not visited, C target does not know B depends on it. So, we add a flag
to C marking it as rescanned. When visiting the B target, the flag is
noticed and C-includes-2 is added to the list of B's dependencies as
well.

Note also that internal nodes are sometimes updated too. Consider this
dependency graph:

....
a.o ---> a.cpp
            a.cpp-includes -->  a.h (scanned)
                                   a.h-includes ------> a.h (generated)
                                                                 |
                                                                 |
            a.pro <-------------------------------------------+
....

Here, our handling of generated headers come into play. Say that a.h
exists but is out of date with respect to "a.pro", then "a.h
(generated)" and "a.h-includes" will be marked for updating, but "a.h
(scanned)" will not. We have to rescan "a.h" after it has been created,
but since "a.h (generated)" has no associated scanner, it is only
possible to rescan "a.h" after "a.h-includes" target has been updated.

The above consideration lead to the decision to rescan a target whenever
it is updated, no matter if it is internal or not.

________________________________________________________________________________________________________
*Warning*

The remainder of this document is not intended to be read at all. This
will be rearranged in the future.
________________________________________________________________________________________________________

File targets
------------

As described above, file targets correspond to files that B2
manages. Users may be concerned about file targets in three ways: when
declaring file target types, when declaring transformations between
types and when determining where a file target is to be placed. File
targets can also be connected to actions that determine how the target
is to be created. Both file targets and actions are implemented in the
`virtual-target` module.

Types
~~~~~

A file target can be given a type, which determines what transformations
can be applied to the file. The `type.register` rule declares new types.
File type can also be assigned a scanner, which is then used to find
implicit dependencies. See "link:#bbv2.arch.depends[dependency
scanning]".

Target paths
~~~~~~~~~~~~

To distinguish targets build with different properties, they are put in
different directories. Rules for determining target paths are given
below:

1.  All targets are placed under a directory corresponding to the
project where they are defined.
2.  Each non free, non incidental property causes an additional element
to be added to the target path. That element has the the form
`<feature-name>-<feature-value>` for ordinary features and
`<feature-value>` for implicit ones. [TODO: Add note about composite
features].
3.  If the set of free, non incidental properties is different from the
set of free, non incidental properties for the project in which the main
target that uses the target is defined, a part of the form
`main_target-<name>` is added to the target path. **Note:**It would be
nice to completely track free features also, but this appears to be
complex and not extremely needed.

For example, we might have these paths:

....
debug/optimization-off
debug/main-target-a
....
